# Low-memory optimization
Mistralai Mistral Small 3.2 24B Instruct 2506 GGUF
Apache-2.0
This is the Llamacpp imatrix quantized version of the Mistral-Small-3.2-24B-Instruct-2506 model, offering various quantization types to meet different hardware requirements.
Large Language Model Supports Multiple Languages
M
bartowski
3,769
12
Gemma 3 12B FornaxV.2 QAT CoT Q4 0 GGUF
This is an experimental small reasoning model designed to run on 8GiB consumer-grade GPUs with general inference capabilities. Through supervised fine-tuning (SFT) and high-quality reasoning trajectory training, the model can generalize its reasoning abilities to multiple tasks.
Large Language Model
G
ConicCat
98
1
Huihui Ai Qwen3 14B Abliterated GGUF
Apache-2.0
Qwen3-14B-abliterated is a quantized version based on the Qwen3-14B model, optimized using llama.cpp, offering multiple quantization options to meet different performance requirements.
Large Language Model
H
bartowski
6,097
5
Qwen Qwen3 32B GGUF
Apache-2.0
Quantized version based on Qwen/Qwen3-32B, using llama.cpp for quantization, supporting multiple quantization types for different hardware requirements.
Large Language Model
Q
bartowski
49.13k
35
Llama 2 7b Chat Hf GGUF
Llama 2 is a 7B-parameter large language model developed by Meta, offering multiple quantization versions to accommodate different hardware requirements.
Large Language Model English
L
Mungert
1,348
3
Phi 4 GGUF
MIT
phi-4 is an open-source language model developed by Microsoft Research, focusing on high-quality data and reasoning capabilities, suitable for memory/computation-constrained environments.
Large Language Model Supports Multiple Languages
P
Mungert
1,508
3
RWKV7 Goose World3 2.9B HF GGUF
Apache-2.0
RWKV-7 model based on flash-linear attention format, supporting multilingual text generation tasks.
Large Language Model Supports Multiple Languages
R
Mungert
14.51k
16
Thedrummer Cydonia 24B V2.1 GGUF
Other
Cydonia-24B-v2.1 is a 24B parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantized versions to suit different hardware requirements.
Large Language Model
T
bartowski
4,417
7
Rombo Org Rombo LLM V3.1 QWQ 32b GGUF
Apache-2.0
Rombo-LLM-V3.1-QWQ-32b is a 32B-parameter large language model, processed with llama.cpp's imatrix quantization, offering multiple quantization versions to accommodate different hardware requirements.
Large Language Model
R
bartowski
2,132
5
Nera Noctis 12B GGUF
Other
Llamacpp imatrix quantized version of Nera_Noctis-12B, based on Nitral-AI/Nera_Noctis-12B model, supporting English text generation tasks.
Large Language Model English
N
bartowski
64
6
Aura 4B GGUF
Apache-2.0
Aura-4B is a quantized version based on AuraIndustries/Aura-4B, using llama.cpp for imatrix quantization, supporting multiple quantization types, suitable for text generation tasks.
Large Language Model English
A
bartowski
290
8
Featured Recommended AI Models